By César Pérez
In this notebook I'll describe a test to compare the performance of two python libraries to manage SQL, sqlite3 and pyodbc. For this exercise, I am going to focus on INSERT and SELECT statements. I've created two auxiliary scripts to increasingly add rows and columns to an SQLite and MSSQL databases and capture the timestamps just before and after the task execution to measure the time taken to complete. Both scripts create or update a CSV file having values such as: operation performed, rows added, existing rows, start / end times and their differece. In this notebook, I won't directly describe the scripts that I've used to test the performace but they are available in the links bellow.
Processor: Intel(R) Core(TM) i5-10300H CPU \@ 2.50GHz 2.50 GHz
RAM: 16.0 GB (15.8 GB)
OS: Windows 10 Home x64
Python Libraries: sqlite3, pyodbc (4.0.35)
Python version: 3.9.13
Even if we want to perform the same operation on both databases, the codes cannot be exacly the same. For the SQLite script I've used the method executescript(), and for the MSSQL script I've used the method executemany()
To analyse the performance, I'm going to:
#1. Env prepare
import pandas as pd
import seaborn as sns
# 2. Read and prepare data
SQLite_data = pd.read_csv('report_SQLite.csv')
SQLite_data['source'] = 'SQLite'
SQLite_data.head()
cols | rows | inserted_rows | start_op | end_op | time_diff | operation | file_size(KB) | source | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 100 | 100 | 2022-07-31 18:03:56.346697 | 2022-07-31 18:04:00.048014 | 3.701317 | INSERT | 160.0 | SQLite |
1 | 1 | 100 | 100 | 2022-07-31 18:04:00.063387 | 2022-07-31 18:04:00.063387 | 0.000000 | SELECT | 160.0 | SQLite |
2 | 1 | 300 | 200 | 2022-07-31 18:04:00.063387 | 2022-07-31 18:04:07.452064 | 7.388677 | INSERT | 160.0 | SQLite |
3 | 1 | 300 | 200 | 2022-07-31 18:04:07.452064 | 2022-07-31 18:04:07.467377 | 0.015313 | SELECT | 160.0 | SQLite |
4 | 1 | 600 | 300 | 2022-07-31 18:04:07.467377 | 2022-07-31 18:04:18.789870 | 11.322493 | INSERT | 160.0 | SQLite |
MSSQL_data = pd.read_csv('report_MSSQL.csv')
MSSQL_data['source'] = 'MSSQL'
MSSQL_data.head()
cols | rows | inserted_rows | start_op | end_op | time_diff | operation | source | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 100 | 100 | 2023-02-03 07:42:02.832929 | 2023-02-03 07:42:02.838674 | 0.005745 | INSERT | MSSQL |
1 | 1 | 100 | 100 | 2023-02-03 07:42:02.853426 | 2023-02-03 07:42:02.853426 | 0.000000 | SELECT | MSSQL |
2 | 1 | 300 | 200 | 2023-02-03 07:42:02.855421 | 2023-02-03 07:42:02.865166 | 0.009745 | INSERT | MSSQL |
3 | 1 | 300 | 200 | 2023-02-03 07:42:02.866132 | 2023-02-03 07:42:02.867129 | 0.000997 | SELECT | MSSQL |
4 | 1 | 600 | 300 | 2023-02-03 07:42:02.868158 | 2023-02-03 07:42:02.883086 | 0.014928 | INSERT | MSSQL |
merged_df = pd.concat([SQLite_data.loc[:, ~SQLite_data.columns.isin(['file_size(KB)'])], MSSQL_data])
merged_df = merged_df.loc[merged_df['time_diff'] < 300] #this step is needed to remove outliers
merged_df_INSERT = merged_df.loc[merged_df['operation'] == 'INSERT']
merged_df_SELECT = merged_df.loc[merged_df['operation'] == 'SELECT']
# 3. Results
# Plot the lines on two facets, source https://seaborn.pydata.org/examples/faceted_lineplot.html
sns.relplot(
data=merged_df_INSERT,
x="inserted_rows",
y="time_diff",
col="source",
kind="line",
hue="cols",
height=5,
aspect=.75,
facet_kws=dict(sharex=False),
alpha = 0.6
)
<seaborn.axisgrid.FacetGrid at 0x25d0a8632b0>
sns.relplot(
data=merged_df_SELECT,
x="rows",
y="time_diff",
col="source",
kind="line",
hue="cols",
height=5,
aspect=.75,
facet_kws=dict(sharex=False),
alpha = 0.6
).set_xticklabels(['','0', '20K', '40K', '60K', '80K', '100K', '120K', ''],rotation=30)
<seaborn.axisgrid.FacetGrid at 0x25d0a863e80>
The results indicates that the MSSQL script and library has a much better performance than the SQLite script. It seems that using SQLite provides portability at the expense of performance.
Pandas. McKinney, W., & others. (2010). Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference (Vol. 445, pp. 51–56).
Seaborn. Waskom, M., Botvinnik, Olga. Kane, Drew, Hobson, Paul, Lukauskas, Saulius, Gemperline, David C, … Qalieh, Adel. (2017). mwaskom/seaborn: v0.8.1 (September 2017). Zenodo. https://doi.org/10.5281/zenodo.883859